Data Weeding Techniques Applied to Roget's Thesaurus
نویسندگان
چکیده
It can be difficult to automatically generate “nice” graphical representations for concept lattices from lexical databases, such as Roget’s Thesaurus, because the data sources tend to be large and complex. This paper discusses a variety of “data weeding” techniques that can be applied in order to reduce the size of a concept lattice, first, in general and then with respect to Roget’s Thesaurus. The aim is that resulting lattices should display neither too much, nor too little information, independently of which search terms have been entered by a user.
منابع مشابه
A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity
This paper presents the results of using Roget's International Thesaurus as the taxonomy in a semantic similarity measurement task. Four similarity metrics were taken from the literature and applied to Roget's. The experimental evaluation suggests that the traditional edge counting approach does surprisingly well (a correlation of r=0.88 with a benchmark set of human similarity judgements, with...
متن کاملA Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity
This paper presents the results of using Roget’s International Thesaurus as the taxonomy in a semantic similarity measurement task. Four similarity metrics were taken from the literature and applied to Roget’s. The experimental evaluation suggests that the traditional edge counting approach does surprisingly well (a correlation of r=0.88 with a benchmark set of human similarity judgements, with...
متن کاملEvaluation of Automatic Updates of Roget's Thesaurus
abstract Keywords: lexical resources, Roget's Thesaurus, WordNet, semantic relatedness, synonym selection, pseudo-word-sense disambiguation, analogy Thesauri and similarly organised resources attract increasing interest of Natural Language Processing researchers. Thesauri age fast, so there is a constant need to update their vocabulary. Since a manual update cycle takes considerable time, autom...
متن کاملDisambiguating Hypernym Relations for Roget's Thesaurus
Roget’s Thesaurus is a lexical resource which groups terms by semantic relatedness. It is Roget’s shortcoming that the relations are ambiguous, in that it does not name them; it only shows that there is a relation between terms. Our work focuses on disambiguating hypernym relations within Roget’s Thesaurus. Several techniques of identifying hypernym relations are compared and contrasted in this...
متن کاملRoget2000: a 2D hyperbolic tree visualization of Roget's Thesaurus
Thesauri, such as Roget’s Thesaurus, show the semantic relationships among terms and concepts. Understanding these relationships can lead to a greater understanding of linguistic structure and could be applied to creating more efficient natural-language recognition and processing programs. A general assumption is that focus and context displays of hyperbolic trees accelerate browsing ability ov...
متن کامل